73 research outputs found
parallelMCMCcombine: An R Package for Bayesian Methods for Big Data and Analytics
Recent advances in big data and analytics research have provided a wealth of
large data sets that are too big to be analyzed in their entirety, due to
restrictions on computer memory or storage size. New Bayesian methods have been
developed for large data sets that are only large due to large sample sizes;
these methods partition big data sets into subsets, and perform independent
Bayesian Markov chain Monte Carlo analyses on the subsets. The methods then
combine the independent subset posterior samples to estimate a posterior
density given the full data set. These approaches were shown to be effective
for Bayesian models including logistic regression models, Gaussian mixture
models and hierarchical models. Here, we introduce the R package
parallelMCMCcombine which carries out four of these techniques for combining
independent subset posterior samples. We illustrate each of the methods using a
Bayesian logistic regression model for simulation data and a Bayesian Gamma
model for real data; we also demonstrate features and capabilities of the R
package. The package assumes the user has carried out the Bayesian analysis and
has produced the independent subposterior samples outside of the package. The
methods are primarily suited to models with unknown parameters of fixed
dimension that exist in continuous parameter spaces. We envision this tool will
allow researchers to explore the various methods for their specific
applications, and will assist future progress in this rapidly developing field.Comment: for published version see:
http://www.plosone.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0108425&representation=PD
Asymptotic properties and approximation of Bayesian logspline density estimators for communication-free parallel computing methods
In this article we perform an asymptotic analysis of Bayesian parallel
density estimators which are based on logspline density estimation. The
parallel estimator we introduce is in the spirit of a kernel density estimator
introduced in recent studies. We provide a numerical procedure that produces
the density estimator itself in place of the sampling algorithm. We then derive
an error bound for the mean integrated squared error for the full data
posterior density estimator. We also investigate the parameters that arise from
logspline density estimation and the numerical approximation procedure. Our
investigation identifies specific choices of parameters for logspline density
estimation that result in the error bound scaling appropriately in relation to
these choices.Comment: 33 pages, 11 figure
Recommended from our members
Shared Molecular Features of Inherited and Sporadic ALS/FTD
Amyotrophic Lateral Sclerosis (ALS) and Frontotemporal Dementia (FTD) are two devastating neurodegenerative diseases in urgent need of therapeutic intervention. The last seven years has been a period of great progress in understanding these disorders separately and as a disease spectrum. Most notable is the discovery of the hexanucleotide GGGGCC expansion in the C9ORF72 (C9) gene, which is the greatest known cause of inherited and sporadic forms of these two diseases. In response to this groundbreaking discovery, we set out to elucidate the molecular mechanisms of C9 pathogenesis with a focus on the expanded RNA transcripts derived from the C9 expansion. Our two primary goals have been to contribute to the worldwide efforts to understand the primary toxic insults of this mutation that will ultimately shape therapeutic development, and to identify molecular criteria that can be used to define new links between these diseases and undetermined genetic factors.
In the introduction, we review the broad conceptual links between RNA binding proteins (RBPs), mRNA regulation, and neurodegeneration. This review contains substantial discussion of ALS, FTD, and C9, as well as related neurodegenerative, neuromuscular and repeat expansion diseases. In addition to providing a detailed history of molecular mechanisms proposed for these disorders, this section serves as a justification for our focus on the C9 RNA, RBP sequestration, and altered splicing that we describe in the following chapters.
Chapter two consists of our 2016 Elife paper on sequestration of the RBP hnRNP H and resulting splicing changes in C9ALS-FTD afflicted individuals. In this paper, we sought to identify the most biochemically sound candidate for the proposed RBP sequestration hypothesis. We found that the splicing factor hnRNP H binds with high affinity to the repeat sequence and likely has a role in regulating the transition of the repeat RNA from linear to G-quadruplex (G-Q) conformation. Importantly, we identified functional deficiency of this protein in patient brains, as evidenced by dysregulation of known hnRNP H splicing targets, and loss of soluble hnRNP H.
Chapter three consists of recently submitted work on the molecular links between C9ALS/FTD, and sporadic ALS/FTD at large. Building upon our findings in C9ALS-FTD, we have sought to ask whether the changes to hnRNP H we predicted would occur in C9 expansion carriers as a result of the repeat RNA might also occur independent of this expansion. We found that indeed half of all patients in a cohort of 50 sporadic ALS, ALS-FTD, and FTD brains demonstrated hnRNP H sequestration and accompanying splicing changes, a pattern we refer to as like-C9. Like-C9 patients may be thought of as phenocopies of C9 expansion carriers, in that they not only present with similar clinical symptoms, but also possess remarkably similar molecular signatures of RBP dysfunction. While the genomic origins of like-C9 remain unknown, we propose that they are suggestive of repeat expansions analogous to C9, much like what is seen in DM1 and DM2, and HD and HDL2 (discussed in Ch. 1). This work has provided the foundation for our ongoing search for novel genomic expansions that confer increased susceptibility to ALS/FTD
Lying About Terrorism
Conventional wisdom holds that terrorism is committed for strategic reasons as a form of costly signaling to an audience. However, since over half of terrorist attacks are not credibly claimed, conventional wisdom does not explain many acts of terrorism. This article suggests that there are four lies about terrorism that can be incorporated in a rationalist framework: false claiming, false flag, the hot-potato problem, and the lie of omission. Each of these lies about terrorism can be strategically employed to help a group achieve its desired goal(s) without necessitating that an attack be truthfully claimed
Recommended from our members
Bayesian models for pooling microarray studies with multiple sources of replications
BACKGROUND: Biologists often conduct multiple but different cDNA microarray studies that all target the same biological system or pathway. Within each study, replicate slides within repeated identical experiments are often produced. Pooling information across studies can help more accurately identify true target genes. Here, we introduce a method to integrate multiple independent studies efficiently. RESULTS: We introduce a Bayesian hierarchical model to pool cDNA microarray data across multiple independent studies to identify highly expressed genes. Each study has multiple sources of variation, i.e. replicate slides within repeated identical experiments. Our model produces the gene-specific posterior probability of differential expression, which provides a direct method for ranking genes, and provides Bayesian estimates of false discovery rates (FDR). In simulations combining two and five independent studies, with fixed FDR levels, we observed large increases in the number of discovered genes in pooled versus individual analyses. When the number of output genes is fixed (e.g., top 100), the pooled model found appreciably more truly differentially expressed genes than the individual studies. We were also able to identify more differentially expressed genes from pooling two independent studies in Bacillus subtilis than from each individual data set. Finally, we observed that in our simulation studies our Bayesian FDR estimates tracked the true FDRs very well. CONCLUSION: Our method provides a cohesive framework for combining multiple but not identical microarray studies with several sources of replication, with data produced from the same platform. We assume that each study contains only two conditions: an experimental and a control sample. We demonstrated our model's suitability for a small number of studies that have been either pre-scaled or have no outliers
Recommended from our members
Formation of a TBX20-CASZ1 protein complex is protective against dilated cardiomyopathy and critical for cardiac homeostasis
By the age of 40, one in five adults without symptoms of cardiovascular disease are at risk for developing congestive heart failure. Within this population, dilated cardiomyopathy (DCM) remains one of the leading causes of disease and death, with nearly half of cases genetically determined. Though genetic and high throughput sequencing-based approaches have identified sporadic and inherited mutations in a multitude of genes implicated in cardiomyopathy, how combinations of asymptomatic mutations lead to cardiac failure remains a mystery. Since a number of studies have implicated mutations of the transcription factor TBX20 in congenital heart diseases, we investigated the underlying mechanisms, using an unbiased systems-based screen to identify novel, cardiac-specific binding partners. We demonstrated that TBX20 physically and genetically interacts with the essential transcription factor CASZ1. This interaction is required for survival, as mice heterozygous for both Tbx20 and Casz1 die post-natally as a result of DCM. A Tbx20 mutation associated with human familial DCM sterically interferes with the TBX20-CASZ1 interaction and provides a physical basis for how this human mutation disrupts normal cardiac function. Finally, we employed quantitative proteomic analyses to define the molecular pathways mis-regulated upon disruption of this novel complex. Collectively, our proteomic, biochemical, genetic, and structural studies suggest that the physical interaction between TBX20 and CASZ1 is required for cardiac homeostasis, and further, that reduction or loss of this critical interaction leads to DCM. This work provides strong evidence that DCM can be inherited through a digenic mechanism
RNA-seq in the tetraploid Xenopus laevis enables genome-wide insight in a classic developmental biology model organism
Advances in sequencing technology have significantly advanced the landscape of developmental biology research. The dissection of genetic networks in model and nonmodel organisms has been greatly enhanced with high-throughput sequencing technologies. RNA-seq has revolutionized the ability to perform developmental biology research in organisms without a published genome sequence. Here, we describe a protocol for developmental biologists to perform RNA-seq on dissected tissue or whole embryos. We start with the isolation of RNA and generation of sequencing libraries. We further show how to interpret and analyze the large amount of sequencing data that is generated in RNA-seq. We explore the abilities to examine differential expression, gene duplication, transcript assembly, alternative splicing and SNP discovery. For the purposes of this article, we use Xenopus laevis as the model organism to discuss uses of RNA-seq in an organism without a fully annotated genome sequence
Xenopus: An emerging model for studying congenital heart disease
Congenital heart defects affect nearly 1% of all newborns and are a significant cause of infant death. Clinical studies have identified a number of congenital heart syndromes associated with mutations in genes that are involved in the complex process of cardiogenesis. The African clawed frog, Xenopus, has been instrumental in studies of vertebrate heart development and provides a valuable tool to investigate the molecular mechanisms underlying human congenital heart diseases. In this review, we discuss the methodologies that make Xenopus an ideal model system to investigate heart development and disease. We also outline congenital heart conditions linked to cardiac genes that have been well-studied in Xenopus and describe some emerging technologies that will further aid in the study of these complex syndromes
- …